Log-domain Speech Feature Enhancemen Estimation and a Phase-sensitive Model
نویسندگان
چکیده
In this paper we present an MMSE (minimum mean square error) speech feature enhancement algorithm, capitalizing on a new probabilistic, nonlinear environment model that effectively incorporates the phase relationship between the clean speech and the corrupting noise in acoustic distortion. The MMSE estimator based on this phase-sensitive model is derived and it achieves high efficiency by exploiting single-point Taylor series expansion to approximate the joint probability of clean and noisy speech as a multivariate Gaussian. As an integral component of the enhancement algorithm, we also present a new sequential MAP-based nonstationary noise estimator. Experimental results on the Aurora2 task demonstrate the importance of exploiting the phase relationship in the speech corruption process captured by the MMSE estimator. The phasesensitive MMSE estimator reported in this paper performs significantly better than phase-insensitive spectral subtraction (54% error rate reduction), and also noticeably better than a phase-insensitive MMSE estimator as our previous state-of-the-art technique reported in [2] (7% error rate reduction), under otherwise identical experimental conditions of speech recognition.
منابع مشابه
A phase-averaged model for the relationship between noisy speech, clean speech and noise in the log-mel domain
In this work, we demonstrate that the most widely-used model for the relationship between noisy speech, clean speech and noise in the log-Mel domain is inaccurate due to its disregard of the phase. Moreover, we show how a more exact model can be derived by averaging over the phase in the log-Mel domain, and how this can profitably be applied to particle filter based sequential noise compensatio...
متن کاملNoisy Speech Feature Estimation on the Aurora2 Database using a Switching Linear Dynamic Model
This paper presents an approach to enhance speech feature estimation in the log spectral domain under additive noise environments. A switching linear dynamic model (SLDM) is explored as a parametric model for the clean speech distribution, enforcing a state transition in the feature space and capturing the smooth time evolution of speech conditioned on the state sequence. Experimental results u...
متن کاملA Nonlinear Observation Model from Corrupted Speech Log Me
In this paper we present a new statistical model, which describes the corruption to speech recognition Mel-frequency spectral features caused by additive noise. This model explicitly represents the effect of unknown phase together with the unobserved clean speech and noise as three hidden variables. We use this model to produce noise robust features for automatic speech recognition. The model i...
متن کاملAn analytic derivation of a phase-sensitive observation model for noise robust speech recognition
In this paper we present an analytic derivation of the moments of the phase factor between clean speech and noise cepstral or log-mel-spectral feature vectors. The development shows, among others, that the probability density of the phase factor is of sub-Gaussian nature and that it is independent of the noise type and the signal-to-noise ratio, however dependent on the mel filter bank index. F...
متن کاملDistributed multichannel speech enhancement with minimum mean-square error short-time spectral amplitude, log-spectral amplitude, and spectral phase estimation
In this paper, the authors present optimal multichannel frequency domain estimators for minimum mean-square error (MMSE) short-time spectral amplitude (STSA), log-spectral amplitude (LSA), and spectral phase estimation in a widely distributed microphone configuration. The estimators utilize Rayleigh and Gaussian statistical models for the speech prior and noise likelihood with a diffuse noise f...
متن کامل